\begin{aquote}{Peter T. Daniels \cite{DanielsBright1996}}
``Humankind is defined by language; but civilization is defined by writing.''
\end{aquote}
The practice of writing is one of the most distinguishing features of mankind,
and has been around for thousands of years.  It has enabled us to record history
and to pass on knowledge from generation upon generation. During all this time, different
systems of writing have constantly surfaced, vanished, evolved, and merged. This
constant process has provided us with an enormous set of different writing
systems.  

Apart from naturally formed systems, quite some artificial systems have
been created.  In these cases, an existing language was up until that point not
yet written down, and in most cases the creation of such a writing system was
commissioned by a governing power. %Perhaps the most significant example of this
%is Hangul, the official featural script of Korea. Created in 1443 by sage King Seycong,
%it has been the native Korean script ever since, and is one of the most
%scientifically designed scripts in the world \cite{DanielsBright1996}.
Examples include the official script of Korea, called Hangul, and the Cherokee alphabet,
created for recording the language of the same name.

Both with natural writing systems and artificial systems, the distinction can
be made between on the one hand writing systems whose symbols directly
represent concepts or ideas (\emph{pictograms}) and on the other hand systems
where the more abstract glyphs represent some form of \emph{language}.  One distinctive
definition that serves the purpose of this paper well is the definition of
writing as it was posited by Daniels and Bright
\cite{DanielsBright1996}:
\begin{quote}
``\ldots writing is defined as \emph{a system of more or less permanent marks
used to represent an utterance in such a way that it can be recovered more or
less exactly without the intervention of the utterer\ldots}''
\end{quote}
This definition excludes pictograms from writing.
It describes the highest level properties of writing systems, i.e.
the properties that all writing systems have in common.  It lays the foundation
for the family tree in which all character sets can be placed.

Of course, this
definition leaves a lot of possibility for variation.  Writing systems can vary
in the way the glyphs are connected to the language. For example, in a
\emph{logo syllabary} each glyph represents a complete word,
whereas in an \emph{alphabet} each glyph represents either a vowel or a
consonant.
Apart from such linguistic differences, character sets can vary strongly in
their graphical appearance. Indeed, the difference in shape between the glyphs
of a logo syllabary such as Kanji and an alphabet such as the Hebrew alphabet
could hardly be more significant.

Nevertheless, there is a limit to this
variation. After all, not every random amalgamation of pixels would be
considered a character. You could ask any person to identify which of the two
images in Figure \ref{fig:qrvschar} is the character, and everybody would be
able to do so, without needing any prior knowledge of the writing system from
which the character stems.

\begin{figure}[!h]
  \centering
  \subfigure[QR Code]{\includegraphics[width=0.2\textwidth]{img/qrExample}}
  \subfigure[Georgian character]{\includegraphics[width=0.2\textwidth]{img/georgianExample}}
  \caption{QR code vs. letter from Georgian alphabet. To human beings the difference is obvious.}
  \label{fig:qrvschar}
\end{figure}

Such structural restrictions suggest that, given a certain canvas size, the
sample space of characters is much smaller than the sample space of all
possible images, and one should be able to train a classifier on purely
morphological features (i.e. appearance) to tell the difference between glyphs
and non-glyphs. 

On a lower level it is perhaps more interesting to look at morphological
differences between writing systems, especially ones that are alike in
appearance. Taking, for example, the Latin and Greek alphabets, one would agree
that they are physically more similar than the previously mentioned combination
of Hebrew and Kanji. Still, the question could be posed as to whether there is
some inherent physical property of the two systems that sets them apart.  In
more technical terms, is it possible to train a classifier on a set of purely
morphological features extracted from glyphs of different writing systems to
distinguish between such systems? Answering this question constitutes one part
of this work.

% SYNTHESIS
The knowledge gathered from this analysis, besides serving for the
classification of writing systems, has another application. If the distribution
of values that glyphs from a certain writing system take is known, an
optimization algorithm can be fitted to synthesize new glyphs. These glyphs, in
the ideal case, would then be perceived as belonging the writing system from
which the original features were generated. More realistically, these
synthesized glyphs should approximate the original writing systems physical
properties.

%This paper is concerned with the analysis of writing systems. Although the differences
%between certain writing styles (fonts) are touched upon, it is not the main purpose of this work.
%Consequently, to minimize the influence of font design, we limit our analysis to one specific
%font. This font contains glyphs for all writing systems discussed.

In the next chapter, relevant studies conducted in related areas of study are
discussed.  The results from these studies are used to lay the foundations upon
which this work is built. Chapter \ref{chp:method} deals with the methodology
of this paper's approach.  The technical details of the implementation are
discussed in Chapter \ref{chp:impl}.  After presenting the results in Chapter
\ref{chp:results}, some topics for discussion are posited in Chapter
\ref{chp:discussion}. Finally, Chapter \ref{chp:conclusion} concludes this
work.

A full listing of code used can be found in Appendix \ref{chp:code}.

%In-depth analysis of different writings systems has been carried out from the
%perspective of linguistics on numerous occasions, with one of the most
%comprehensive studies being the previously cited work by Daniels and Bright
%\cite{DanielsBright1996}.  To our knowledge, however, a computational approach
%to writing system classification and analysis is missing.
%
%If so, one should be able to come up with a computable factor that can classify a
%certain glyph as belonging to either the Greek or the Latin alphabet.
%
%In this paper we will investigate whether such a computable factor can be found.
%We will use (relatively) easily computable features and use tested supervised
%learning methods to create a classifier capable of distinguishing between a
%numerable set of different glyph sets.
%Seen from a slightly
%lower level, one can distinguish half a dozen significantly different types of
%writing system. These range from the \emph{alphabet}, where each symbol represents a
%consonant or a vowel, to the \emph{logosyllabary}, where each symbol represents
%a specific word.



